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The nucleotide sequence of the peplomer (£2) gene of MHV-A59 was determined from a set of overlapping cDNA 
clones. The E2 gene encodes a protein of 1324 amino acids including a hydrophobic signal peptide. A second large 
hydrophobic domain is found near the COOH terminus and probably represents the membrane anchor. Twenty 
glycosylation sites are predicted. Cleavage of the E2 protein results in two different 90K species, 90A and 90B (L. S. 
Sturman, C. S. Ricard, and K. V. Holmes (1985) J. Virol. 56, 904-911), and activates cell fusion. Protein sequencing of 
the trypsin-generated N-terminus revealed the position of the cleavage site. 90A and 90B could be identified as the 
C-terminal and the N-terminal parts, respectively. Amino acid sequence comparison of the A59 and JHM E2 proteins 
showed extensive homology and revealed a stretch of 89 amino acids in the 90B region of the A59 E2 protein that is 


absent in JHM. © 1987 Academic Press, Inc. 


INTRODUCTION 


Murine hepatitis viruses (MHV) are coronaviruses 
which cause a variety of diseases including hepatitis 
and encephalomyelitis in the natural host (Wege et a/., 
1982). They are studied extensively, since MHV is a 
useful animal model for virus-induced demyelination 
and because coronaviruses possess a unique mode of 
replication (Siddell et a/., 1983). 

The infectious genome of MHV consists of a single- 
stranded RNA of about 20 kb which is associated with 
a single protein species with a mol wt of 54K in a 
helical nucleocapsid. Two membrane-associated pro- 
teins are present in the virions: the large glycoprotein 
E2, forming the characteristic surface projections or 
peplomers, and the smaller membrane glycoprotein E1 
(26.5K) (Armstrong et a/., 1984a). The peplomer pro- 
tein, encoded by mRNA 3 (Rottier et a/., 1981), is syn- 
thesized on ribosomes bound to the rough endoplas- 
mic reticulum (RER) where it is cotranslationally glyco- 
sylated (Sturman and Holmes, 1983; Holmes et a/., 
1984) and subsequently acylated, probably during 
transport through the Golgi apparatus (Nieman and 
Klenk, 1981; Sturman et a/., 1985). MHV virions bud 
from the RER and Golgi membranes and are appar- 
ently transported to the exterior by the internal secre- 
tory apparatus. Two forms of the E2 protein are 
present on the surface of the virion, the 180K and the 
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90K species (Sturman and Holmes, 1977). Recently it 
has been shown that the 90K protein consists of two 
different species, 90A and 90B, arising from proteo- 
lytic cleavage of the 180K protein (Sturman ef a/., 
1985). This cleavage activates cell fusion, and the ratio 
of 180K to 90K proteins is host dependent (Sturman et 
al., 1985; Frana et a/., 1985). It has been suggested 
that such host-dependent differences in the process- 
ing of E2 may be important for cytopathic effects, viru- 
lence, and tissue tropism of the murine coronaviruses 
(Frana et a/., 1985). 

The peplomer protein is involved in cell attachment 
(Collins et a/., 1982) and is the target for neutralizing 
antibodies (Fleming et a/., 1983). E2 plays an important 
role in the pathology of MHV. Buchmeéier et a/. (1984) 
showed that in MHV-JHM-infected mice passive 
transfer of neutralizing monoclonal antibodies, recog- 
nizing E2, prevented fatal infection by wild-type virus. 
Instead, a chronic demyelinating disease developed. 
These changed pathogenic properties seem to be a 
result of mutations in E2 (Dalziel et a/., 1986; Fleming 
et a/., 1986). To understand the biological and patho- 
genic properties of MHV at the molecular level the 
primary structure of E2 and data on its processing are 
essential. Here we report the cDNA cloning and se- 
quence analysis of the gene encoding the E2 protein 
of MHV-A59. By direct amino acid sequence analysis 
of the N-terminal part of the 90A species we were also 
able to identify the trypsin cleavage site. E2 is the main 
structural protein determining strain differences be- 
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cause it shows significant antigenic polymorphism, in 
contrast to the other structural proteins (Talbot and 
Buchmeier, 1985). It will probably reflect the major 
differences between related coronavirus species. To 
localize these differences we have compared the £2 
gene sequence of the MHV-A59 strain with that of 
strain JHM published recently (Schmidt et a/., 1987). 


MATERIALS AND METHODS 


Identification of the trypsin cleavage site by amino 
acid sequence analysis 


Virus purification was carried out with modification 
as described by Sturman et a/. (1980, 1985). Purified 
90A and 90B proteins were prepared from trypsin- 
treated virions and the 180K E2 protein was prepared 
from untreated virus. Following incubation with tryp- 
sin, 10 wg/ml, in TMEN, pH 6.5 (50 mM Tris—maleate, 
1 mM EDTA, 100 mM NaC)), at 37° for 30 min, soy- 
bean trypsin inhibitor, 50 4g/ml, was added for 30 min 
at 4°. Virus was then sedimented at 24,000 rpm in an 
SW 28 rotor at 4° for 2.5 hr. E2 was extracted with 
Triton X-114, and 90A and 90B were separated as 
described previously by HPLC on HPHT (hydroxyapa- 
tite) columns in sodium dodecyl sulfate (SDS) (Ricard 
and Sturman, 1985). Uncleaved (180K) E2 was sepa- 
rated from 90K species by HPLC size exclusion chro- 
matography with a Bio-Sil TSK (Bio-Rad) guard column 
and Bio-Sil TSK 400, 7.5 X 300 mm, and Spherogel 
TSK 4000 (Altex), 7.5 X 300 mm, columns connected 
in series. SDS was removed from purified proteins by 
ion pair extraction with acetone:triethylamine:acetic 
acid:water, 85:5:5:5 (Henderson et a/., 1979). Proteins 


were washed with trifluoroacetic acid, lyophylized, and - 


dissolved in trifluoracetic acid. The amino terminal se- 
quence was determined by automated Edman degra- 
dation using an Applied Biosystems gas phase se- 
quencer. Phenylthiohydantoyn (PTH) amino acids 
were identified by HPLC. 


cDNA synthesis and cloning 


Viral genomic RNA and poly(A)-containing intracel- 
lular RNAs were isolated from purified MHV-A59 vir- 
ions and infected cells, respectively (Spaan et a/., 
1981). Procedures for synthesis of cDNA were essen- 
tially identical to those described by Dowling (1983) 
and Gubler and Hoffman (1983). For the synthesis of 
the single-stranded cDNA, pentanucleotides and spe- 
cific primers were used. Full details will be presented 
elsewhere (P. J. Bredenbeek et a/., iyPRISONEE in prep- 
aration). 

After kamaneniner tailing of the double-stranded 
cDNA (Peacock, 1981) or digestion with restriction en- 


donucleases, the cDNA was annealed to dG-tailed 
pUC9 DNA (Pharmacia) or ligated to pEMBL DNA 
(Dente et a/., 1983), respectively. Transformation was 
carried out by adding the annealed or ligated DNA to 
Escherichia coli strain JM101 or JM109 competent 
cells (Messing, 1983), prepared by the method de- 
scribed by Hanahan (1983), which were subsequently 
plated on petri dishes containing 25 g/ml ampicillin. 


Screening and analysis of recombinants 


Plasmid DNA from ampicillin-resistant colonies ob- 
tained after transformation of the ligated restriction 
fragments was prepared according to the method de- 
scribed by Birnboim and Doly (1979). The mapping of 
the cDNA clones on the genome will be described in 
detail elsewhere (P. J. Bredenbeek et a/., manuscript in 
preparation). 


Formaldehyde—agarose gel analysis and 
hybridization 


Poly(A)-containing RNA from MHV-infected cells 
was denatured in the presence of formaldehyde and 
separated in an 1.5% agarose-formaldehyde gel 
(Lehrach et a/., 1977). After electrophoresis the gel 
was dried on Whatmann 3 MM paper and subse- 
quently incubated with a kinase-labeled oligonucleo- 
tide probe according to Meinkoth and Wahl (1984). 
Hybridization and washing temperature was 5-10° 
below the calculated 7,4. 


Oligonucleotide synthesis 


Oligonucleotides were prepared as described pre- 
viously (Niesters et a/, 1986) or were synthesized 
using a DNA-synthesizer, Biosearch Model 8600, and 
subsequently purified by HPLC. 


DNA sequence analysis 


DNA fragments were prepared by digestion with a 
variety of restriction enzymes and ligated either as a 
mixture or as single fragments purified from agarose 
gels into the M13 vectors mp8 and mp9 (Messing, 
1983). White plaques were screened for viral inserts 
using pentamer-primed probes from cDNA clones 
(Feinburg and Vogelstein, 1983; Roberts and Wilson, 
1985). Single-stranded M13 DNA was isolated and 
used for sequence analysis using the dideoxynucleo- 
tide chain termination procedure of Sanger et ai. 
(1977). Sequence data were assembled and analyzed 
using the computer programs created by Staden 
(1986). 


MHV-A59 E2 PROTEIN STRUCTURE AND CLEAVAGE SITE 


Protein sequence homology searches 


The predicted amino acid sequence was compared 
to other sequences and to the NBRF Protein Bank 
using the FASTP program set of Lipman and Pearson 
(1985) and the DIAGON program of Staden (1982). 


RESULTS 


cDNA cloning, mapping of recombinant plasmids, 
and sequence analysis 


When we started this sequence study, a number of 
cDNA clones against cellular RNAs of MHV-A59 was 
already available. Mapping by hybridization to the viral 
mRNAs (P. J. Bredenbeek et a/., manuscript in prepara- 
tion) and sequence analysis indicated that the over- 
lapping clones 95, 918, and 85 were positioned 
around the 5’ end of the E2 gene (Fig. 1). Clone 853 
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was mapped beyond the 3’ end in gene D. Oligonu- 
cleotide 7 (OL 7) complementary to a sequence in the 
3’ end of clone 85 and oligonucleotide 8 (OL 8) based 
upon the sequence of clone 853 were synthesized and 
used to screen the new random genomic cDNA fi- 
brary. Several positive recombinant DNA clones were 
isolated and characterized by restriction site mapping. 
This permitted construction of a continuous map of 
approximately 5 kb containing the complete unique 
region of mRNA 3 encoding the E2 protein. 

The large insert of clone B24 was isolated and sub- 
sequently digested with restriction endonuclease 
Hpall or Tagl. The complete digests were ligated into 
M13 mp9. Initial selection of subclones overlapping 
the consensus sequence of clones 95, 918, and 85 
was performed by hybridizing a probe from clone B60 
to phage DNA. The sequence strategy is summarized 
in Fig. 1. 
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Fic. 1. Cloning and sequencing strategy of the MHV-A59 £2 gene and hydrophobicity pattern of the predicted amino acid sequence. Open 
boxes represent open reading frames in the coding regions of RNA 2 (B), RNA 3 (E2), and RNA 4(D) transposed on the genome. Vertical bars 
indicate homology regions in the intergenic sequences of MHV-A59. Numbers represent the nucleotide distance to the start of the 3’ poly(A) tail. 
The vertical arrow points at the trypsin cleavage site. The double arrow marks the region in MHV-A59 absent in strain JHM. Small boxes 
represent the synthetic oligonucleotides used in cloning, sequencing, and hybridizations, numbered when referred to in the text. CDNA clones 
are indicated by horizontal lines. Extent and direction of sequencing is shown by means of the arrows beiow. Symbols indicating restriction sites 
are explained in the figure. The hydrophobicity pattern was generated using the HYDROPLOT program created by Staden (1986), modified with 
hydrophobicity data from Eisenberg et a/. (1982). Above the line is hydrophobic. 


MHV-JHM: AMINO ACID SEQUENCE (DIFFERENCES FROM A59) L 
MHV-A59; PREDICTED AMINO ACID SEQUENCE MobLF VY F ¢ bLFLPS CLG Y IGODEFR 20 
MHV-A59: NUCLEOTIDE SEQUENCE CODING REGION RNA 3- (RRTCTAAAGHIoctCTnCGTCTITATICTATTITTGCOCTCTYGITTAGGCTA' TATIGCTGATTTTAGA = 60 
feeeeeeoeceeeoeeeene eevee 
T Y N A A D K 
cI QauvynsncoaWvsaPSISTETVEVSQGLGTYYVLODRvVyY LIN 60 
TGTATCCAGCTTGTGAATTCAAACGGTGCTAATGTTAGTGCTCCAAGCATTAGCACTGAG ACUGTTGAAGTTI'CACAAGGCCTGGGGACATATTATGTGITAGATCGAGTTTATTTAAAT 180 


N Y T L T K F SE 
ATLLULTGYYPVYDGS K FRNLAtL YT GTN S VS LS WFQPP Y¥Y LN Q F100 
GCCACATTATTIGCTTACTGGTTACTACCCGGTCGATGGTTCTAAGTTTAGAAACCTCGCT CT TACGGGAACTAACTCAGTIAGCTI'G1CGTGGITICAACCACCCTATTTAAATCAGTTT — 300 


N T Ss N 
NDGIPFaAKVQNLKTSTPSGaATAY FPT YTV¥VIGSLFGYT S ¥ T VV 140 
AATGATGGCATATTTGCGAAGGTGCAGAACCTTAAGACAAGTACGCCATCAGGTGCAACT GCATA'TTTICCTACTATAGTTATAGCTACTITGTITGGCTATACTICCTATACCGTIGTA 420 


L N I T. P R V 
1 EP YNGVYVIMAS VCQYTI1CQLeRY TDC K PNTN GN KL Lt GF WH tT 180 
ATAGAGCCATATAATGGTGTTATAATGGCCTCAGTGTGCCAGTATACCATTTGTCAGTT A CCITACACTGATTGTAAGCCTAACACTAATGGTAATAAGCTIATAGGGTTITGGCACACG 540 


L F PoWL Q 
Dv KPPICVLK RINSFTLNVNADA FY FHF YQUGGCTFYAY YA OD K~ 220 
GATGTAAAACCCCCAATTTGTGTGTTAAAGCGAAATTTCACGCTTAATGTTAATGCTGAT GUATTTTATTTTCATITTTACCAACATGGTGCTACTITITATGCGTACTATGCGGATAAA 660 


F T L L 
PS ATT FL PS VY 1TCGDIL T@Y y VEP FPF CN P TAGS TF APR Y WV 266 
CCCTCCGCTACTACGTTTTIGTTTAGTGTATATATTGGCGATATTTTAACACAGTATTAT GIGTTACCTITCATCTGCAACCCAACAGCTGGTAGCACTTITGCICCGCGCTATTGGGTT 780 


L E 1 M 
TPOLVK RQYLFNFNQKGEVITS AVDCAS S YTS ETKCKTQSEN TL 300 
ACACCTTTGGTTAAGCGCCAATATTTGTTTAATTICAACCAGAAGGGTGTCATTACTAGT GUTGTIGATTGTGCTAGTAGTTATACCAGTGAAATAAAATGTAAGACCCAGAGCATETTA — 900 


D P Dd K K 
PST GY YEUSGYTVYVQPVYGY¥Y¥ ¥Y RRVANL PACN T EE Wh TAR S V 340 
CCTAGCACTGGTGTCTATGAGTTATCCGGTTATACGGTCCAACCAGTTGGAGTTGFATAC CGGCGTCTTGC TAACCTCCCAGCTI'GTAATATAGAGGAGTGUCTTACTGCTAGGTCAGTC 1020 


R ) 
PSPLNWERKTFQNCNFJLSSLLRYVQAESLFCNNIDAS K V_ 380 
CCCTCCCCTCTCAACTGGGAGCGTAAGACTTTTCAGAATTGTAATTTTAAITTAAGCAGC CTGTTACGTIATGTICAGGCTGAGAGTTIGTTTTGTAATAATATCGATGCITCCAAAGTG 1140 


M 7 I I I 
YGRcCFGStISVDKFAVPRSRQVODLQLGNSGFLQTAN Y KI DT 420 
TATGGCAGGTGCTTTGGTAGTATTTCAGTTGATAAGTTTGCTGTACCCCGAAGTAGGCAA GIIGATITACAGCTTGGTAACTCTGGATTTCTGCAGACTGCTAATTATAAGATTGATACA 1260 


Y Ss Y K * * * & kok & 
AAT SCQLUuyYTLUPKN—VTINNHNPSSWNHRRYGCFNDAGVYEGK 460 
GCTGCCACTTCGTGTCAGCTGCATTACACCTTGCCTAAGAATAATGTCACCATAAACAAC CATAACCUCTCGTCTTGGAATAGGAGGTATGGCTTTAATGATGCTGGCGTCTITGGCAAA 1380 


i 
NQHDVVYaQQCFTVRSS¥CPCKCAQPDIVS PCTTQTK PKS A F500 
AACCAACATGACGTTGTTTACGCTCAGCAATGTTTTACTGTAAGATCTAGTTATTGCCCG TGTGCTCAACCGGACATAGTT AGCCCTTGCACTACTCAGACTAAGCCTAAGTCTGCTTIT 1500 


ee a 
VN VGODHEeEGL GY LEDNCGNA YP PH K GC ECA Cl} N S F ft GW S HD TT 540 
GTTAATGTGGGTGACCATTGTGAAGGCTTAGGTGTTTTAGAAGAT AATI'GTGGCAATGCT GATCCACATAAGGGTIGTATCTGTGCCAACAATICAPLTATIGGATGGTCACA EGATACC. 1620 


* A \ 
c LY N DRC QI F AN TLLN GINS GY TC § tT DL QL PN TEV ¥Y TG 1 C580 
‘IGCCTTGTTAATGATCGCTGCCAAATTTTTGCTAATATATTGTTAAATGGCATTAATAGT GGTACCACATGTTCCACAGATTTGCAGTTGCCTAATACTGAAGTGGTTACTGGCATTIGT 1740 


R A 
VK YY DLYGtITGQGYFK EVKADYYNSWQTLLYbDVYN GNLNG F R-~ 620 
GTCAAATATGACCTCTACGGTATTACTGGACAAGGTGTTTTTAAAGAGGTT AAGGCTGAC TATTATAATAGCTGGCAAACCCTECTGTATGATGTTAATGGTAATTIGAATGCTITICGT 1860 


Y E 
pLutTTMkTYTIRSCYSGRVSAAFHKDAPEPALLY RN If}csS ¥ 660 
GATCTTACCACTAACAAGACTTATACGATAAGGAGCTGTTATAGTGGCCGTGTTICTGCT GUATTTCATAAAGATGCACCCGAACCCCCTCIGCICTATCGTAATATAAATICTACCTAT. 1980 


T N 
vps NINJITSREENPLNYFDSYLGCCVYVNA DIR TDEALPNCDLR 70 
GTTTTTAGCAATAATATTTCCCGTGAGGAGAACCCACTT AATTACTTTGATAGTTATTTGGGCTIGIGTIGTTAATGCTGATAACCCCACCGATCGAGCCCCTICCTAATTGTGATCTCCGT. 2100 


R M 
a are REET CROCE nett ce Bea srt 9 
ATGGGTGCTGGCTT ATGCGTTGATTATTCAAAATCACGCAGGGCTCACCGATCA' Al ACACTCCGATGTTAGTTAATGATAGTGIC 2220 
G 1 A 


Q@svbdpGLYEmMQIPTINFTIGCHH EEFI1QTRSPKVTILDCAAEFYVE 780 
CAATCCGTTGATGGATTATATGAGATGCAAATACCAACCAATTTTACTATTGGGCACCAT GAGGAGTTCATTCAAACTAGATCTCCAAAGGTGACTATAGATIGTGCTGCATTTGTCTGT 2340 


A D 
G DN TACRQQLVY EY GS FC VY NY NAT EN EV NN LL DN MN QL Q Yo AS 820 
GGTGATAACACTGCATGCAGGCAGCAGTTGGTTGAGTATGGCTCTTTCTGIGITAATGTT AATCCCATTCTTAATGAGGTTAATAACCTCTIGGATAATATGCAACTACAAGTIGCTAGT 2460 


ALMQGVYTIS$SRLPDGISGPiIh»obpdIfFSPLLGCIGS TC AED CG 860 
GCATTAATGCAGGGTGTTACTATAAGCTCGAGACTGCCAGACGGCATCTCAGGCCCTATA GATGACATTAATTTTAGTCCTCTACTIGGATGCATAGGITCAACATGTGCTGAAGACGGC 2580 


N GPS AIRGRSATITEDLLFDKYkKLSDVGFEVEAY NWICTCGCGQE V 900 
AATGGACCTAGTGCAATCCGAGGGCGTTCTGCTATAGAGGATTIGTTATTIGACAAGGTC AAATTATCTGATGTIGGCTTTGTCGAGGCTTATAATAATTGCACCGGTGGTCAAGAAGTT 2700 
Fic. 2. Nucleotide and predicted amino acid sequence of the MHV-A59 E2 gene. Numbering starts at the ATG codon (arrow) at position 
—7403 from the poly(A) tail. Dots mark the N-terminal signal sequence and the C-terminal membrane anchor. The trypsin cleavage generated 
N-terminal amino acid sequence of 90A as analyzed by Edman degradation is underlined. The cleavage site between 90B and 90A is indicated 
by an arrowhead. Potential glycosylation sites are indicated by boxed asparagine residues. The MHV-JHM amino acid sequence (Schmidt et a/., 
1987) is printed where differences with MHV-A59 occur. Asterisks represent deletions. Intergenic homology regions are boxed. 
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MHV-A59 E2 PROTEIN STRUCTURE AND CLEAVAGE SITE 


RDLLCYQS FNGIKVOLP P V¥ 


7: N 


S AAA GY PFSLSV¥QY¥YRINGLG 


E 


LGA IQDG F DATNS ALG KI QS 


CTGGGTGCTATCCAGGATGGGTTTGATGCAACCAATTCTGCTTTAGGTAAGATCCAGTCC GTTCTTAATGC AAATGCTGAAGCACTC AA TAACTTACTAAATCAGCTTICTAACAGGTTT. 


D 


GAT, So Ae SL Qe Ee TB TR EE A NE A: OK 


F 


QbLs DS TLI kK VS A AQA TI 


c t 


Se BV Qe iN As PSY Go ob TY OR dB BS Yee POT 


N N 
PK AG ¥ F VV Q 


CCTAAAGCTGGATATTTTGTTCAAGATGATGGAGAATGGAAGTTCACAGGCAGTTCATAT TACTACCCTGAACCCATTACAGATAAAAACAGTGTCATTATGAGTAGTTGCGCAGTAAAC 


Y T K AP EV FL 


F 


FE K L VT LLODL TY EM N RI 


Y VK WPWYVWLULIGLAGVYAVC VLLF FI 


TATGTGAAATGGCCTTIGGTATGTTTGGTTGCTAATTGGATTAGCTGGTGTAGCTGTTTGT 
eeeoeveeeeeeeeeeeeee 
Ss 


G NCC DEY6GGHQDSsS IVI# 


Ls. .S°4E 
CGTGACCTCCTTTGTGTACAATCTTTTAATGGCATCAAAGTATTACCTCCTGTGTIGTCA GAGAGTCAGATCTCTGGCTACACAACCGGTGCTACTGCGGCAGCTATGTTCCCACCGTGG 


\ 
TCAGCAGCTGCCGGTGTGCCATTTAGTTTAAGTGTTCAATATAGAATTAATGGTTTAGGT GTCACTATGAATGTGCTTAGTGAGAACCAAAAGATGATTGCTAGTGCTTTTAACAATGCG 


V 


A Qi1DRLI 
GGTGCTATTAGTGCTTCTTTACAAGAAATTCTAACTCGGCTTGAGGCTGTAGAAGCAAAA GCCCAGATAGATCGTCTTATTAATGGCAGGTTAACTGCACTTAATGCGTATATATCCAAG 


SFT Ta] ; 
TCTCTIGTCCAGAATGCGCCTTATGGCTTATATTTTATACACTTCAGCTATGTGCCAATA TCCTTTACAACCGCAAATGTGAGTCCTGGACTITGCATTTCTGGTGATAGAGGATTAGCA 


S$ 


T 


y 


BOK. ME NS Ea Week: <S. QT TROT 
CAACTTAGTGATAGTACGCTTATTAAAGTT AGTGCTGCTCAGGCCATAGAAAAGGTCAATGAGTGCGTTAAGAGCCAAACCACGCGTATTAATTTCTGTGGCAATGGTAATCATATATTA 


DDGEWKFTGSS Y YY PEPITDKNSVI™MS S ca V[{N] 


N L 
TS IPN PPODFK EEULUD K WF XK '[N] Q 
TACACAAAGGCACCTGAAGTTTTCTTGAACACTTCAATACCTAATCCACCCGACTTTAAG GAGGAGTTAGATAAATGGTTTAAGAATCAGACGTCTATTGCGCCTGATTTATCTCTCGAT 


QD aAtTIxK KL [x] ES Y I NLK EVGtTY EM 
TTCGAGAAGTTAAATGTTACTTTGCTGGACCTGACGTATGAGATGAACAGGATTCAGGAT GCAATTAAGAAGTTAAATGAGAGCTACATCAACCTCAAGGAAGTTGUCACATATGAAATG 


GTGTTGTTATTCTTTATATGTTGCTGCACAGGTTGTGGCTCATGTTGTTTTAAGAAGTGT 
@eeeeeoeseste 


A 
IS S HED * 
GGAAATTGTTGTGATGAGTATGGAGGACACCAGGACAGTATTGTGATACATAATATTTCC TCTCATGAGGATTGACTATCACAGCCTCTCCTGGAAAGACAGABAATCTAAAC] 


483 


940 
2820 


A 
IS G Y TT GA TA A AM F P P W 


980 
2940 


MN VY LS EN Q KM IAS AF NN A 


R F 1020 


3060 


N AN AE ALN NLLS* QL S§ N 


NG RL.TA LN AY IS K_ 1060 


3180 


N F C GN GN H I L 1100 


3300 


K 
VS PGLC#HIs G DRG L A 1140 


3420 


A I 
[XJ] 1180 
3540 


T S IAP DLS L D 1220 


3660 


1260 
3780 


R 
cc c TG ¢ G6 § € ¢ F K EK CC 1300 


3900 


1324 
4073 


Fig. 2—Continued. 


Nucleotide and amino acid sequence 


The consensus nucleotide sequence shows an 
open reading frame (ORF) of 3972 nucleotides 
stretching from position -7403 to -3429 from the 
poly(A) tail. The initiation codon lies immediately adja- 
cent to a short sequence which is fully compatible with 
the intergenic homology sequence 5’-(A/T)AATC(T/ 
C)AAAC-3' (Bredenbeek et a/., 1987). A similar se- 
quence is found 28 nucleotides downstream from the 
end of the ORF (Fig. 2). There were no alternative 
ORFs longer than 60 amino acids in the unique region 
of mRNA 3. The large ORF is therefore identified as the 
coding sequence of the E2 protein. 

The ORF encodes a protein of 1324 amino acids 
with some typical features. The N-terminal region (Fig. 
2) contains a stretch of amino acids consistent with a 
signal sequence (Von Heyne, 1986). Another region of 
high hydrophobicity is found at the C-terminus and 
probably represents a membrane anchor. In the hy- 
drophobicity plot this region appears as a strong sym- 
metrical peak (Fig. 1). It starts with a series of nonpolar 
amino acids spanning the membrane and ends with a 
cluster of cysteine residues; it is followed by a number 
of charged residues which are probably located at the 
interior of the virion. 


The ORF potentially codes for an apoprotein with a 
mol wt of 146K, which is in the range reported by 
several authors (See Siddell et a/., 1983; Repp et a/., 
1985). Based on the assumption that Asn-X-Thr and 
Asn-X-Ser (X not being Pro) signals can be glycosyla- 
ted and assuming that the extreme C-terminal site is 
located in the interior and thus unlikely to be used, we 
could identify 20 potential sites for N-glycosylation 
(Neuberger et a/., 1972). These are enough to add the 
extra 35K needed to reach the V/, of 180K required for 
the E2 protein. Acylation of E2 has been reported 
(Sturman et a/., 1985) but little is known about acyla- 
tion signals; we could therefore not determine its con- 
tribution to the weight of the protein. 


Identification of the trypsin cleavage site 


The two 90K cleavage products, designated 90A 
and 90B, can be separated by SDS—hydroxyapatite 
chromatography (Ricard and Sturman, 1985). The |o- 
cation of the trypsin cleavage site and the relationship 
of 90A and 90B to the uncleaved protein was deter- 
mined by comparison of the amino terminal sequence 
identified by Edman degradation with the sequence 
deduced from analysis of cDNA. 
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The amino terminal sequence of 90A, Ser-Val- 
Ser—Thr—Gly—Tyr-Arg—Leu—Thr-Thr—Phe—Glu—Pro— 
Tyr-Thr-Pro—Met-Leu, is identical to the sequence 
underlined in Fig. 2. The trypsin cleavage site can thus 
be positioned between residues 717 and 718 in the 
amino acid sequence. The 90B and 180K species ap- 
pear to possess blocked amino termini as no definitive 
amino terminal sequences could be determined. 

The identification of a signal sequence and of a 
membrane anchor, the determination of the amino ter- 
minal sequence of 90A, and the finding that 90A but 
not 90B is acylated (Sturman et a/., 1985) allows us to 
conclude that the structure of E2 is NH.-90B-90A- 
COOH. The cleavage products 90A and 90B have 
ORFs with lengths of 606 and 717 amino acids, re- 
spectively, corresponding with coding capacities for 
apoproteins of 66K and 79K. 


Comparison of the peplomer protein sequences of 
MHV strains A59 and JHM 


Considerable polymorphism has been seen on the 
E2 glycoprotein of coronaviruses (Talbot and Buch- 
meier, 1985). To localize the differences we have 
compared the predicted amino acid sequences of the 
E2 protein of MHV strains A59 and JHM (Fig. 2). 

The two proteins are highly conserved: there is an 
overall homology of 93% and 90A is more conserved 
than 90B (96 and 89%, respectively). However, there 
is a remarkable difference: starting at amino acid (aa) 
454 we find a stretch of 89 aa (267 nucleotides) that is 
not present in the E2 sequence of JHM. To rule out the 
possibility that this additional sequence is the product 
of cDNA cloning artifacts, we isolated and sequenced 
several independent cDNA clones covering the region 
(G10, A11, and B60, Fig. 1). They all contained the 
additional sequence. We then synthesized an oligonu- 
cleotide (OL 53, Fig. 1) complementary to nucleotide 
position 1423 to 1442 in the A59 sequence and hy- 
bridized it to MHV-A59 poly(A)-selected messengers 
separated by electrophoresis. It is clear from Fig. 3 
that the A59 ‘insertion’ is an actual genomic feature, 
as it is found in mRNAs 3, 2, and 1. The extra bands in 
the gel can not be accounted for but have been found 
with other MHV probes (data not shown) and possibly 
represent leaderless RNAs. 

The fact that the sequences of both strains can be 
perfectly aligned (when we exclude the additional se- 
quence) allows a nucleotide to nucleotide comparison 
and the creation of a mutation table (Table 1). The 
sequences of the genes coding for the nucleocapsid 
(N) and the matrix (E1) protein were included because 
their products show little antigenic variation (Talbot 
and Buchmeier, 1985) and may thus be used as refer- 
ences. The ratio of nonsilent to silent (N/S) mutations 


Fic. 3. Hybridization of a synthetic oligonucleotide from the 
MHV-A59 E2 region absent in strain JHM to the MHV-A59 messen- 
ger RNAs. Lane A, hybridization with OL 53 (Fig. 1) specific for the 
additional sequence of MHV-A59. Lane B, hybridization with an 
oligonucleotide complementary to part of the A59 leader sequence 
(Spaan et a/., 1984). RNAs are numbered according to Spaan et ai. 
(1981). 


can be interpreted as an indication of mutation selec- 
tion. In random mutated sequences, when no selec- 
tion mechanism is involved, this ratio will be about 3. 
Lower ratios will reflect selection against mutation 
whereas higher values indicate positive selection. For 
functional genes, however, this ratio ranges from 0.2 
to 1.7, since many mutations will be lethal and there- 
fore not found (Hewett-Emmett et a/., 1982). The ratios 
N/S for the coronaviral proteins are indeed in this 
range (Table 1). When we consider the V and E7 genes 
as less susceptible to selective pressure we can un- 
derstand the lower ratio found for the 90A species— 
about haif of that of the other proteins (Table 1)—-as an 
indication of a negative selection, i.e., supression of 
amino acid mutations. 


DISCUSSION 


The unique region of MHV-A59 mRNA 3 contains 
the information for the viral peplomer protein E2 (Rot- 
tier et a/., 1981). The nucleotide and derived amino 
acid sequence of the gene presented in this paper 
allows us to position several functional domains of the 
coronaviral peplomer protein in the sequence. The 
predicted signal sequence at the N-terminus is con- 
sistent with the finding that E2 is translated on ribo- 
somes bound to the rough endoplasmic reticulum 
(Holmes et a/., 1984). 
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TABLE 1 


MUTATIONAL DIFFERENCES BETWEEN MHV Strains A59 AND JHM 


Total number of mutated 


Gene N/S Nucleotides (%) Amino acids (%) 
90B 0.46 253 (13.4) 67 (10.7) 
S0A 0.26 122 (6.7) 25 (4.1) 

Et 0.50 21 (3.1) 7 (3.1) 

N 0.53 98 (7.2) 29 (6.4) 


Note. Mutations were scored from aligned sequences. The ratio 
nonsilent to silent (N/S) mutations was calculated based upon a 
method described by Nei and Gojobori (1986). The scores for 90B 
are obtained by excluding the stretch in A59 that is absent in JHM. 
90B, the N-terminal cleavage product of the peplomer protein; 90A, 
the C-terminal part; E1, matrix protein; N, nucleocapsid protein. 
Data are from Armstrong et a/. (1984b), Skinner and Siddell (1983), 
Pfleiderer et a/. (1986), and Schmidt et a/. (1987). 


The MHV trypsin cleavage site has been determined 
by analyzing the cleavage-generated amino terminus 
and localizing it in the protein sequence. Cleavage of 
E2 by trypsin is required for activation of the cell-fusing 
activity of the coronavirus (Sturman et a/., 1985). 
Cleavage activation of cell fusion is also found with the 
HA and F glycoproteins of myxo- and paramyxoviruses 
where a hydrophobic amino terminus is involved in cell 
fusion (Gething et a/., 1978; Richardson et a/., 1980). 
The amino terminal sequence of 90A shows no homol- 
ogy with analogous regions of HA2 and F1 of myxo- 
and paramyxoviruses and does not have a similar 
highly hydrophobic character, because it contains two 
charged residues (Arg and Glu). Moreover, there is no 
sequence homology at the amino terminus of the tryp- 
sin cleavage site between the spike proteins of MHV 
and infectious bronchitis virus (IBV; Binns et a/., 1985), 
although their positions are similar. Cleavage of E2 by 
thermolysin, which has a specificity different from that 
of trypsin, also activates MHV-induced cell fusion 
(Baker and Sturman, manuscript in preparation). This 
suggests that proteolytic cleavage of E2 may expose a 
functionally important domain that is internal rather 
than adjacent to the cleavage site. The sequence up- 
stream of the cleavage site resembles the consensus 
sequences of trypsin cleavage sites of several other 
glycoproteins (Cavanagh et a/., 1986). 

Proteolytic cleavage of E2 appears to be an impor- 
tant determinant of MHV pathogenesis. Investigations 
are in progress to identify the host- and strain-depen- 
dent differences in the processing of E2. 

At its C-terminus 90A contains the highly hydropho- 
bic potential membrane anchor of the peplomer pro- 
tein. A feature of this sequence is that it starts with a 
stretch of eight residues: Lys-Trp—Pro-Trp—Tyr—Val 


Trp-Lys which appears to be identical in coronavir- 
uses MHV-A59, MHV-JHM, IBV-M41 (Niesters et a/., 
1986), IBV-M42 (Binns et a/., 1985), feline infectious 
peritonitis virus (FIPV; R. J. De Groot et a/., manuscript 
in preparation), and transmissible gastroenteritis virus 
(TGEV; Jacobs et a/., manuscript in preparation). This 
sequence apparently represents a structural signal 
associated with membrane anchoring. Both E2 cleav- 
age products in virions have an apparent mol wt of 90K 
as determined by SDS-PAGE (Sturman et a/., 1985) 
but the ORFs of 90A and 90B differ in length and cod- 
ing capacity. In comparison in MHV-JHM the E2 cleav- 
age products are also of an equal apparent mol wt of 
98K (Siddell et a/., 1981), yet in this strain the lengths 
of the 980A and 90B ORFs are similar. Even if we take 
into consideration the inaccuracy of electrophoretic 
size estimation due to different SDS binding capacities 
of the cleavage products, we cannot exclude the pos- 
sibility of extra or different processing of the A59 cleav- 
age products compared to JHM. 

It is not clear whether the additional sequence is 
deleted in JHM in the course of evolution or inserted 
into A59, but it is important to notice that it starts in an 
eight nucleotide stretch 5'-TTAATGAT-3' (Fig. 2) that is 
repeated at the point where the sequences of both 
strains are in step again. This repeat is possibly in- 
volved in the creation of the genetic difference be- 
tween A59 and JHM. 

Apparently the 90B part of the peplomer protein can 
undergo radical changes without losing its function. 
This is also reflected by the fact that 90B shows the 
highest relative number of mutations. In contrast 
90A is less mutated—-but more important—shows a 
much lower ratio of nonsilent to silent mutations. This 
indicates a selection against sequence changes. De 
Groot et a/. (1987) compared the peplomer protein se- 
quences of coronaviruses from three different anti- 
genic clusters and found that the C-terminal parts 
were conserved whereas the N-terminal parts were 
not. They demonstrated that the C-terminal sequence 
contained sequence patterns that could explain the 
typical elongated form of the coronaviral spike. The 
negative selection in 90A may therefore reflect preser- 
vation of structural features. 

The fact that the ratio of nonsilent to silent mutations 
in 90B is comparable to that in the nucleocapsid and 
E71 gene suggests that there is no stronger positive 
selection mechanism—favoring escape mutations— 
in this part of the protein. Talbot and Buchmeier (1985) 
tested a panel of neutralizing monoclonal antibodies to 
MHV-JHM E2 on strain A59 and demonstrated that 
two conformation-dependent antigenic determinants 
were not shared by JHM and A59 whereas a third 
conformation-independent determinant was found on 
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both strains. From our data we suggest that the con- 
formation-dependent epitopes are on the more vari- 
able 90B part; the SDS-stable site is probably situated 
on the structurally important and higher conserved 
90A part of the MHV peplomer protein. Experiments 
are in progress to localize these epitopes in the pre- 
dicted amino acid sequence. 
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